Meta-event generation based on time attributes

ABSTRACT

First stage meta-events are generated based on analyzing time attributes of base events received from a network component. Second stage meta-events are generated based on a number of the first stage meta-events that have a time attribute falling within a time period. An amount of time that has passed since a most-recent second stage meta-event was generated is determined, and if a threshold time period does not exceed the amount of time that has passed since the most-recent second stage meta-event was detected, a third stage meta-event is determined.

PRIORITY

This application is a continuation of U.S. patent application Ser. No.10/308,767, filed Dec. 2, 2002, entitled “CORRELATION ENGINE WITHSUPPORT FOR TIME-BASED RULES”, which is incorporated by reference in itsentirety.

FIELD OF THE INVENTION

The present invention relates to a computer-based system for theidentification and processing of security events from heterogeneoussources, including a correlation engine with support for time—basedrules.

BACKGROUND

Computer networks and systems have become indispensable tools for modernbusiness. Today terabits of information on virtually every subjectimaginable are stored in and accessed across such networks by usersthroughout the world. Much of this information is, to some degree,confidential and its protection is required. Not surprisingly then,intrusion detection systems (IDS) have been developed to help uncoverattempts by unauthorized persons and/or devices to gain access tocomputer networks and the information stored therein.

Intrusion detection may be regarded as the art of detectinginappropriate, incorrect or anomalous activity within or concerning acomputer network or system. The most common approaches to intrusiondetection are statistical anomaly detection and pattern-matchingdetection. IDS that operate on a host to detect malicious activity onthat host are called host-based IDS (and may exist in the form of hostwrappers/personal firewalls or agent-based software), and those thatoperate on network data flows are called network-based IDS. Host-basedintrusion detection involves loading software on the system (the host)to be monitored and using log files and/or the host's auditing agents assources of data. In contrast, a network-based intrusion detection systemmonitors the traffic on its network segment and uses that traffic as adata source. Packets captured by the network interface cards areconsidered to be of interest if they match a signature.

Regardless of the data source, there are two complementary approaches todetecting intrusions: knowledge-based approaches and behavior-basedapproaches. Almost all IDS tools in use today are knowledge-based.Knowledge-based intrusion detection techniques involve comparing thecaptured data to information regarding known techniques to exploitvulnerabilities. When a match is detected, an alarm is triggered.Behavior-based intrusion detection techniques, on the other hand,attempt to spot intrusions by observing deviations from normal orexpected behaviors of the system or the users (models of which areextracted from reference information collected by various means). When asuspected deviation is observed, an alarm is generated.

Advantages of the knowledge-based approaches are that they have thepotential for very low false alarm rates, and the contextual analysisproposed by the intrusion detection system is detailed, making it easierfor a security officer using such an intrusion detection system to takepreventive or corrective action. Drawbacks include the difficulty ingathering the required information on the known attacks and keeping itup to date with new vulnerabilities and environments.

Advantages of behavior-based approaches are that they can detectattempts to exploit new and unforeseen vulnerabilities. They are alsoless dependent on system specifics. However, the high false alarm rateis generally cited as a significant drawback of these techniques andbecause behaviors can change over time, the incidence of such falsealarms can increase.

With both knowledge-based and behavior-based systems, matches aredetected with the aid of a rules engine. Many current rules enginesimplement a standard RETE algorithm because the rules engine'sperformance is demonstrably independent of the number of rules that areused.

Regardless of whether a host-based or a network-based implementation isadopted and whether that implementation is knowledge-based orbehavior-based, an intrusion detection system is only as useful as itsability to discriminate between normal system usage and true intrusions(accompanied by appropriate alerts). If intrusions can be detected andthe appropriate personnel notified in a prompt fashion, measures can betaken to avoid compromises to the protected system. Otherwise suchsafeguarding cannot be provided. Accordingly, what is needed is a systemthat can provide accurate and timely intrusion detection and alertgeneration so as to effectively combat attempts to compromise a computernetwork or system

SUMMARY OF INVENTION

A rules engine with support for time-based rules is disclosed. A methodperformed by the rules engine, comprises receiving security eventsgenerated by a number of network devices. The security events areaggregated. One or more time-based rules are provided to a RETE engine.The aggregated security events are provided to the RETE engine atspecific times associated with the time-based rules. The security eventsare cross-correlated with the one or more time-based rules; and one ormore first stage meta-events are reported.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and notlimitation, in the figures of the accompanying drawings in which:

FIG. 1 illustrates one embodiment of a computer-based system forcapturing, normalizing and reporting security events from heterogeneoussources configured in accordance with the present invention;

FIG. 2 illustrates procedures followed by an agent configured inaccordance with an embodiment of the present invention when collecting,normalizing and reporting security event data;

FIG. 3 illustrates procedures followed by a manager configured inaccordance with an embodiment of the present invention when analyzingsecurity event data and generating alerts based thereon;

FIG. 4 illustrates one embodiment of a rules engine in accordance withthe present invention; and

FIGS. 5A and 5B illustrate a set of procedures followed by a rulesengine in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

Described herein is a computer-based system for the identification andprocessing of security events from heterogeneous sources, including acorrelation engine with support for time-based rules. The system (oneembodiment of which is manifest as computer software), implements amethod that comprises receiving security events generated by a number ofnetwork devices. The security events are aggregated. One or moretime-based rules are provided to a RETE engine. The aggregated securityevents are provided to the RETE engine at specific times associated withthe time-based rules. The security events are cross-correlated with theone or more time-based rules; and one or more first stage meta-eventsare reported.

Although the present system will be discussed with reference to variousillustrated examples, these examples should not be read to limit thebroader spirit and scope of the present invention. For example, theexamples presented herein describe distributed agents, managers andconsoles, which are but one embodiment of the present invention. Thegeneral concepts and reach of the present invention are much broader andmay extend to any computer-based or network-based security system. Also,examples of the messages that may be passed to and from the componentsof the system and the data schemas that may be used by components of thesystem are given in an attempt to further describe the presentinvention, but are not meant to be all-inclusive examples and should notbe regarded as such.

Some portions of the detailed description that follows are presented interms of algorithms and symbolic representations of operations on datawithin a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the computerscience arts to most effectively convey the substance of their work toothers skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared and otherwise manipulated. It has provenconvenient at times, principally for reasons of common usage, to referto these signals as bits, values, elements, symbols, characters, terms,numbers or the like. It should be borne in mind, however, that all ofthese and similar terms are to be associated with the appropriatephysical quantities and are merely convenient labels applied to thesequantities. Unless specifically stated otherwise, it will be appreciatedthat throughout the description of the present invention, use of termssuch as “processing”, “computing”, “calculating”, “determining”,“displaying” or the like, refer to the action and processes of acomputer system, or similar electronic computing device, thatmanipulates and transforms data represented as physical (electronic)quantities within the computer system's registers and memories intoother data similarly represented as physical quantities within thecomputer system memories or registers or other such information storage,transmission or display devices.

As indicated above, one embodiment of the present invention isinstantiated in computer software, that is, computer readableinstructions, which, when executed by one or more computerprocessors/systems, instruct the processors/systems to perform thedesignated actions. Such computer software may be resident in one ormore computer readable media, such as hard drives, CD-ROMs, DVD-ROMs,read-only memory, read-write memory and so on. Such software may bedistributed on one or more of these media, or may be made available fordownload across one or more computer networks (e.g., the Internet).Regardless of the format, the computer programming, rendering andprocessing techniques discussed herein are simply examples of the typesof programming, rendering and processing techniques that may be used toimplement aspects of the present invention. These examples should in noway limit the present invention, which is best understood with referenceto the claims that follow this description.

Referring now to FIG. 1, an example of a computer-based system 10architected in accordance with an embodiment of the present invention isillustrated. System 10 includes agents 12, one or more managers 14 andone or more consoles 16 (which may include browser-based versionsthereof). In some embodiments, agents, managers and/or consoles may becombined in a single platform or distributed in two, three or moreplatforms (such as in the illustrated example). The use of thismulti-tier architecture supports scalability as a computer network orsystem grows.

Agents 12 are software programs that provide efficient, real-time (ornear real-time) local event data capture and filtering from a variety ofnetwork security devices and/or applications. The primary sources ofsecurity events are common network elements including firewalls,intrusion detection systems and operating system logs. Agents 12 cancollect events from any source that produces event logs or messages andcan operate at the native device, at consolidation points within thenetwork, and/or through simple network management protocol (SNMP) traps.

Managers 14 are server-based components that further consolidate filterand cross-correlate events received from the agents, employing a rulesengine 18 and a centralized event database 20. One role of manager 14 isto capture and store all of the real-time and historic event data toconstruct (via database manager 22) a complete, enterprise-wide pictureof security activity. The manager 14 also provides centralizedadministration, notification (through one or more notifiers 24), andreporting, as well as a knowledge base 28 and case management workflow.The manager 14 may be deployed on any computer hardware platform and oneembodiment utilizes an Oracle™ database. Communications between manager14 and agents 12 may be bi-directional (e.g., to allow manager 14 totransmit commands to the platforms hosting agents 12) and encrypted. Insome installations, managers 14 may act as concentrators for multipleagents 12 and can forward information to other managers (e.g., deployedat a corporate headquarters).

Consoles 16 are computer- (e.g., workstation-) based applications thatallow security professionals to perform day-to-day administrative andoperation tasks such as event monitoring, rules authoring, incidentinvestigation and reporting. Access control lists allow multiplesecurity professionals to use the same system and event database, witheach having their own views, correlation rules, alerts, reports andknowledge base appropriate to their responsibilities. A single manager14 can support multiple consoles 16.

In some embodiments, a browser-based version of the console 16 may beused to provide access to security events, knowledge base articles,reports, notifications and cases. That is, the manager 14 may include aweb server component accessible via a web browser hosted on a personalcomputer (which takes the place of console 16) to provide some or all ofthe functionality of a console 16. Browser access is particularly usefulfor security professionals that are away from the consoles 16 and forpart-time users. Communication between consoles 16 and manager 14 isbi-directional and may be encrypted.

Through the above-described architecture the present invention cansupport a centralized or decentralized environment. This is usefulbecause an organization may want to implement a single instance ofsystem 10 and use an access control list to partition users.Alternatively, the organization may choose to deploy separate systems 10for each of a number of groups and consolidate the results at a “master”level. Such a deployment can also achieve a “follow-the-sun” arrangementwhere geographically dispersed peer groups collaborate with each otherby passing primary oversight responsibility to the group currentlyworking standard business hours. Systems 10 can also be deployed in acorporate hierarchy where business divisions work separately and supporta rollup to a centralized management function.

Examining each of the various components in further detail, we beginwith the agents 12. Agents 12 are used to collect, reduce and normalizethe enormous amount of data that is generated by a network's securitydevices before a manager 14 acts on the data. As will become evident,this process goes beyond simple log consolidation. Before presentingthose details, however, and to understand why such measures aredesirable, some background regarding how analysts currently cope withsecurity event information generated by multiple network devices isuseful.

Conventional intrusion detection systems can help an analyst detect anattack directed at a network resource such as a server. Usually, suchinvestigations are launched in response to an alert generated by theIDS. As a first step after receiving such an alert, an analyst mightreview perimeter router logs to see if a router associated with thenetwork passed a packet that triggered the alert. If such a packet werediscovered, the analyst would likely then want to review one or morefirewall logs to see if any existing filters blocked the suspect packet.Assume, for the sake of this example, the suspect packet got past anyfirewalls; further investigation would be necessary to determine whetherthe integrity of the server itself was compromised. Such an integritycheck may be performed using a conventional software application such asTripwire, which is a file integrity checker employing MD5 checksums, tosee which files, if any, had been accessed or modified. Finally, theanalyst may have to examine a Syslog or an EventLog from the subjectserver, as well as any tcpdump data collected by a dedicated tcpdumphost, for the segment of time surrounding the attack to determine whatactually happened.

By this time the analyst has accessed many different systems and lookedat several different types of logs in an effort to distill acomprehensive view of the attack. This can be a significant amount ofwork, and time taken in such review and analysis is time lost from thevitally important tasks of securing the network and restoring thecompromised server to make sure that no other systems will be affected.The present invention helps to minimize the time spent on such analysisby consolidating all the relevant information in a single loggingfacility, allowing the analyst to look at the data in whatever sequenceor depth he or she requires.

More than just consolidation, though, the present agents 12 provide datanormalization, which is of great benefit when an analyst must deal withsecurity incidents in a heterogeneous network environment. To understandwhy normalization is helpful consider a typical enterprise environment,which consists of many different types of network devices ranging fromborder routers and VPN devices, to firewalls and authentication servers,and a wide range of application servers such as web servers, e-mailservers and database servers. Each of these devices generates logs that,as described above, are sources of data to a security analyst. However,it is seldom, if ever, the case that two manufactures will use the sameevent logging mechanism or format their event logs identically. Forexample a Cisco Systems PIX™ firewall will not report an accepted packetin the same way as a Check Point firewall or even in the same fashion asa Cisco Systems router.

An example of the types of various reports that might be generated bydifferent network devices is presented below in Table 1, which showsexamples of logs from different network devices, each reporting the samepacket traveling across a network. In particular, these logs represent aremote printer buffer overflow that connects to IIS servers over port80.

TABLE 1 Examples of Event Logs for Different Network Devices. NetworkDevice Event Log Check Point firewall ″14″ ″21Dec2001″ ″12:10:29″″eth-s1p4c0″ ″ip.of.firewall″ ″log″ “accept″ ″www-http″ ″65.65.65.65″″10.10.10.10″ ″tcp″ ″4″ ″1355″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″ ″″firewall″ ″ len 68″ Cisco Systems router Dec 21 12:10:27:%SEC-6-IPACCESSLOGP: list 102 permitted tcp 65.65.65.65(1355) −>10.10.10.10(80), 1 packet Cisco Systems PIX Dec 21 2001 12:10:28:%PIX-6-302001: Built inbound TCP connection firewall 125891 for faddr65.65.65.65/1355 gaddr 10.10.10.10/80 laddr 10.0.111.22/80 Snort [**][1:971:1] WEB-IIS ISAPI .printer access [**] [Classification: AttemptedInformation Leak] [Priority: 3] 12/21-12:10:29.100000 65.65.65.65:1355−> 10.10.10.10:80 TCP TTL:63 TOS:0x0 ID:5752 IpLen:20 DgmLen:1234 DF***AP*** Seq: 0xB13810DC Ack: 0xC5D2E066 Win: 0x7D78 TcpLen: 32 TCPOptions (3) => NOP NOP TS: 493412860 0 [Xref =>http://cve.mitre.org/cgi- bin/cvename.cgi?name=CAN-2001-0241] [Xref =>http://www.whitehats.com/info/IDS533]

The Check Point record contains the following fields: event id, date,time, firewall interface, IP address of the firewall interface, loggingfacility, action, service, source IP, target IP, protocol, source port,some Check Point specific fields and then the size of the datagram. Thisreport is, to say the least, difficult for a human analyst to read(especially with all the empty fields that are represented by doublequotes). The Cisco router has a different format: date, time, loggingfacility, event name, source IP, source port, target address, targetport, and number of packets. The Cisco PIX firewall, which is producedby the same manufacturer as the router, uses yet another format: date,time, event name, source IP, source port, translated address or targetaddress, target port, local address, and local port.

The final record is a Snort alert that claims this traffic wasmalicious. Snort is a well-known IDS and the fields it populates are:exploit or event name, classification, priority, date, time, source IP,source port, target IP, target port, protocol, TTL (time to live), typeof service, ID, IP length, datagram length, tcp flags, sequence number,acknowledgement number, window size, and tcp length. Snort also reportsadditional data such as references to investigate the exploit.

Agents 12 may be deployed in connection with some or all of these (andother) network components and applications. For example, in FIG. 1,agent 12 a is deployed in connection with an IDS (such as Snort). Agent12 b is deployed in connection with a firewall (such as the Check Pointfirewall and/or the Cisco PIX firewall). Agent 12 c is deployed inconnection with other network components or agents (e.g., a router).Each of these agents receives the event information from its associatednetwork device or application in that device's or application's nativeformat and converts (or normalizes) the information to a common schema.This normalization allows for later storage of the event information ina format that can more readily be utilized by an analyst.

Many normalized schemas can be used and, in general, choosing the fieldsof a common schema may be based on content rather than semanticdifferences between device logs and/or manufacturers. To accomplish thisnormalization, agents 12 are equipped with a parser configured toextract values from the events as reported by the individual networkdevices/applications and populate the corresponding fields in thenormalized schema. Table 2 is an example of a normalized schema for thedata reported by the devices in Table 1.

TABLE 2 Common Schema Representation of Event Data Event Device DateTime Name Src_IP Src_Port Tgt_IP Trg_Port Type Additional data Dec. 21,2001 12:10:29 accept 65.65.65.65 1355 10.10.10.10 80 Check Point Dec.21, 2001 12:10:27 list 102 65.65.65.65 1355 10.10.10.10 80 Ciscopermitted Router tcp Dec. 21, 2001 12:10:28 built 65.65.65.65 135510.10.10.10 80 Cisco inbound PIX tcp connection Dec. 21, 2001 12:10:29WEB-IIS 65.65.65.65 1355 10.10.10.10 80 Snort TCP TTL: 63 ISAPI. TOS:0x0 ID: 5752 printer IpLen: 20 access DgmLen: 1234 DF ***AP*** Seq:0xB13810DC Ack: 0xC5D2E066 Win: 0x7D78 TcpLen: 32 TCP Options (3) =>NOPNOP TS: 493412860 0

Table 2 reports the same four events described earlier, this time in anormalized fashion. Each of the agents 12 is configured to extract therelevant data from events reported by its associated networkdevice/application and map that data to the corresponding common schemarepresentation. For instance the Check Point firewall reports a targetport as www-http, not as port 80 as is the case for most other networkdevices. Therefore an agent 12 associated with the Check Point firewallis configured with an appropriate lookup mechanism (e.g., a table) toensure that “www-http” as reported by the firewall gets translated into“port 80” when the agent 12 reports the event to the manager 14.

Similarly, the agents 12 may need to be configured to convert thedate/time stamp formats used by the various network devices/applicationsinto a common date/time representation. That is, because the differentnetwork devices/applications all use different date/time formats, theagents cannot simply report the date/time stamps reported by thedevice/application. Instead, the agents 12 may be configured to convertlocal date/time stamps to a universal date/time notation, such asGreenwich Mean Time.

In addition to normalizing event data by fields, agents 12 can parse theevent data stream and set field values based on conventions andpractices of the organization. For example, the variety of eventseverity levels that devices produce can all be normalized at the agentlevel into a single, consistent hierarchy.

Thus, agents 12 collect and process events generated by heterogeneousnetwork devices/applications throughout an enterprise. Alerts can comefrom routers, e-mail logs, anti-virus products, firewalls, intrusiondetection systems, access control servers, VPN systems, NT Event Logs,Syslogs, and other sources where security threat information is detectedand reported. In some embodiments, each event generator has an agent 12assigned to collect all relevant security information, while in otherembodiments agents are shared among two or more event generators. Thus,depending on the device/application to be monitored and the in-placeinfrastructure, a choice is provided for simple log parsing and loading,network listening (e.g., through SNMP traps), installation onaggregation points (Syslog servers and concentrators) and fulldistribution to all security-relevant devices.

In addition to collecting and normalizing data from security devices,the agents 12 intelligently manage the data with:

-   -   Filtering: each agent 12 can be configured according to        conditions by which data will be collected and sent to the        manager 14. This helps to reduce the need to collect and manage        large volumes of unwanted data.    -   Aggregation: Based on the time period selected, the agents 12        can collect duplicate alerts but send only a single message with        a count of the total number of such alerts to the manager 14.        This helps reduce the amount of traffic transmitted across the        network.    -   Batching: Agents 12 can be configured to send a collection of        alerts at one time rather than sending alerts immediately after        each occurrence.

FIG. 2 illustrates the various processes performed by agents 12 from thepoint of view of the event information. Initially, at step 30, the rawevent information is received or collected from the native networkdevice or application in that device's/application's native format. Atthis point (or, optionally, following normalization), data filters maybe applied to reduce the volume of data being passed for furtheranalysis (step 32). Such filtering is optional and may involve assessingthe captured data against one or more conditions to determine whether ornot the data is relevant for further analysis.

Thereafter, the event data is normalized at step 34. As indicated above,the normalization may occur at the field and/or the field value level.Further, the normalization may involve translation of the field valuesinto nomenclatures/formats used across an enterprise.

Following normalization, the event data may, optionally, be aggregated(step 36) before being transmitted to the manager 14 (step 38). Thetransmissions may occur as the events are captured or may be made on abatched basis. In either case, the messages used to transmit the eventdata preferably include all of the source fields of an event. Bydelivering the entire event data set (i.e., all of the source fields)organized in a consistent format (i.e., the common schema), powerfulupstream data management, cross-correlation, display and reporting isavailable to the security team. In some embodiments the event data isdiscarded after successful transmission to the manager 14, but in othercases the data may be cached for a time at the agent 12 to permit laterreplay of the data.

Referring again to FIG. 1, the manager 14 includes one or more agentmanagers 26, which are responsible for receiving the event data messagestransmitted by the agents 12. Where bi-directional communication withthe agents 12 is implemented, these agent managers 26 may be used totransmit messages to the agents 12. If encryption is employed foragent—manager communications (which is optional), the agent manager 26is responsible for decrypting the messages received from agents 12 andencrypting any messages transmitted to the agents 12.

Once the event data messages have been received (and if necessarydecrypted), the event data is passed to the rules engine 18. Rulesengine 18 is at the heart of the manager 14 and is used tocross-correlate the event data with security rules in order to generatemeta-events. Meta-events, in the context of the present invention, areinstances of (usually) multiple individual event data elements (gatheredfrom heterogeneous sources) that collectively satisfy one or more ruleconditions such that an action is triggered. Stated differently, themeta-events represent information gathered from different sensors andpresented as correlated results (i.e., the decision output of the rulesengine 18 indicating that different events from different sources areassociated with a common incident as defined by one or more rules).

The actions triggered by the rules may include notifications transmitted(e.g., via notifier 24) to designated destinations (e.g., securityanalysts may be notified via the consoles 16, e-mail messages, a call toa telephone, cellular telephone, voicemail box and/or pager number oraddress, or by way of a message to another communication device and/oraddress such as a facsimile machine, etc.) and/or instructions tonetwork devices (e.g., via agents 12 or via external scripts or programsto which the notifier 24 may pass arguments) to take action to thwart asuspected attack (e.g., by reconfiguring one or more of the networkdevices, and or modifying or updating access lists, etc.). Theinformation sent with the notification can be configured to include themost relevant data based on the event that occurred and the requirementsof the analyst. In some embodiments, unacknowledged notifications willresult in automatic retransmission of the notification to anotherdesignated operator.

As discussed below, when meta-events are generated by the rules engine18, on-screen notifications may be provided to consoles 16 to promptusers to open cases for investigation of the events which led to thenotification. This may include accessing knowledge base 28 to gatherinformation regarding similar attack profiles and/or to take action inaccordance with specified procedures. The knowledge base 28 containsreference documents (e.g., in the form of web pages and/or downloadabledocuments) that provide a description of the threat, recommendedsolutions, reference information, company procedures and/or links toadditional resources. Indeed, any information can be provided throughthe knowledge base 28. By way of example, these pages/documents can haveas their source: user-authored articles, third-party articles, and/orsecurity vendors' reference material.

The rules engine 18 is based on a RETE engine configured to preserveevent information state over configurable time windows so as to providecorrelation of the event data according to specified rules. Correlationis generally regarded as a process of bringing information items intomutual relation. In the context of the present invention, correlationthrough rules engine 18 provides the ability to access, analyze, andrelate different attributes of events from multiple sources to bringsomething to the attention of an analyst that might (or likely would)have otherwise gone unnoticed. In other words, the rules engine 18provides the ability to determine what type of incident is representedby a collection of events reported by a number of heterogeneous networkdevices and/or applications. Because the collected event data isnormalized into a common event schema, correlation can be performedutilizing any field including, but not limited to, geography, devicetype, source, target, time thresholds, and/or event type. Based onalerts generated by the rules engine 18, operators are provided with aworkflow for investigating these incidents.

Turning to FIG. 3, the manager 14 receives (step 40) and analyzes (step42) the event data reported by agents 12 in real-time (or near real-timeowing to network latencies and depending upon whether or not batchedmessage transmission is used) according to a set of flexible rules. Therules define which events generate an alert, when those events generatean alert, and what actions are associated with the alert. Hence, therules may be written to contain event conditions, thresholds, andactions. In some embodiments the rule conditions may be specified usingBoolean operators and/or database queries. When incoming events match aparticular rule's conditions and thresholds, causing a meta-event to begenerated (step 44), the rule automatically fires the action that hasbeen defined (step 46). Such actions can include, but are not limitedto: executing a pre-determined command or script, logging the alert,sending the alert to the consoles 16, sending the alert to notificationdesignees, setting custom severity levels for the alert based oncumulative activity, adding a source to a suspicious list or a target toa vulnerable list, and/or a combination of these actions.

Rules may be created at the manager 14 and/or at the consoles 16 using aflexible scripting language. An example of a rule might be:

-   If(an ids evasion attack) occurs (from the same source ip address)    (3 times) within (2 minutes) then (send message to console) and    (notify the security supervisor via pager).    In this example, the incoming event data would be compared against    the rule conditions and thresholds (in the above example 3 events    that satisfy the condition of an IDS evasion attack are required and    all must originate from a common source IP address and be detected    within 2 minutes of each other), and if those criteria are satisfied    the designated actions (here, sending an alert message to the    consoles 16 and also notifying a security supervisor via a pager)    would be performed. The correlation rules that operate on the events    evaluate threats and attacks according to selected criteria (e.g.,    degree of threat, level of success, vulnerability of target and    value of target) and generate alerts according to a security    intelligence taxonomy that focuses attention on the most dangerous    and potentially most damaging attacks. For example, threats to    network assets that are deemed not to have succeeded or that are not    likely to succeed may be coded green, while those that have    succeeded or have a high probability of success might be coded red.    The value of the security information taxonomy lies in its ability    to eliminate false positives while clearly identifying real threats    to vulnerable and valuable assets.

In general, the rules may be designed to capture threats and attacksthat are typical in large, diverse networks and may be organized toprovide multiple lines of defense by detecting specific activities andgrouping them according to level of threat:

-   -   Reconnaissance zone transfer, port scan, protocol, scanning,        etc.    -   Suspicious illegal outgoing traffic, unusual levels of alerts        from the same host, etc.    -   Attack overflow, IDS evasion, virus, denial of service, etc.    -   Successful compromise of a backdoor, root compromise, covert        channel exploit, etc.        Similar events and signatures may be grouped into rule        categories that can be utilized by the rules to insulate the        rule from changes in vendor-specific event details. For example,        event names may change between product releases or new devices        may be added to the network infrastructure with a new set of        nomenclature. Since the rule categories map similar signatures        into a single name that is used by the rules engine, if an        individual network device changes taxonomy, only the mapping is        changed, not the rule definition. Therefore, despite changes in        individual devices, the investment in custom defined rules is        preserved.

After the events are processed by rules engine 18, the raw event data aswell as any meta-events that were generated are stored in database 20(step 48). In some embodiments, the raw event data may be stored priorto or concurrently with processing of the data by rules engine 18.Regardless of the sequence, such storage of the event data (and the metaevents generated by the rules engine 18) preserves a historical recordof the event traffic and allows for replaying of the events through anexisting or a new rule set (either at the manager 14 or the consoles 16)in order to assess the efficacy of new rules, for training purposes,and/or for case investigation.

Correlation via the rules ensures that credible threats and attacks cometo the attention of the security staff on a high-priority basis. Henceonce an alert is received, the operator can perform in-depth analysisand take aggressive action secure in the knowledge that the effort iswell spent. When a rule match is reported to a console 16, the analystcan quickly drill down (through an associated graphical user interface)to see all of the individual events that caused the rule to fire. Ifnecessary, the analyst can investigate even further to see all of theindividual data elements captured for those events.

When action is required, the present invention provides a full set oftools and services for the operator. Resources such as the ruledefinition, a knowledge base article containing company policies andrecommended actions, and the development of a complete case docketdescribing the problem assist the operator in responding immediately tocritical security threats. If necessary, the operator can proactivelydeal with an attack by launching specific applications or scripts fromthe console 16 to reconfigure device settings or change accessprivileges.

The console 16 provides a centralized view into the security status ofan enterprise and gives administrators, analysts, and operators aninterface to perform security management tasks. In various embodiments,the console provides event display in real-time or in replay mode (i.e.,the ability to playback events from a given time period according to aVCR or DVD metaphor). Replay may be had from the events stored indatabase 20 or, in some instances, from caches associated with agents12. This latter form of replay is especially useful because it providesimproved simulation of actual network conditions as the events areplayed out across the same network as during the original attack.

Consoles 16 also provide operators with complete drill-down capabilityfrom the highest level of detail (e.g., the entire rage of events) tothe lowest level of detail (e.g., fields within a single event). Thisallows analysts to probe at whatever level of detail is required to gainfurther insight into an attack and assess vulnerability. This varyinglevel of detailed analysis is made possible because the agents 12 reportall of the event data fields, not merely a subset thereof. By way ofexample, one tool provides analysts with the ability to quickly seesimilar characteristics of events using a cursor control operation, suchas a mouse click. For example, if analysts are presented with ameta-event alert that consists of, say, twenty or more individual eventsreported by several different agents associated with different networkdevices, the present user interface associated with consoles 16 allowsthe analyst to quickly visualize only the common fields of these events(e.g., such as a source IP address) by simply highlighting the eventsand performing a mouse click/select operation.

Once security personnel have been notified of a meta-event, they canutilize the knowledge base to determine the appropriate actions. Inaddition, security analysts may undertake investigations of eventsand/or meta-events. In general, such matters can be assigned toso-called cases. Stated differently, cases create a workflow andoversight environment for situations where there are suspicious eventsrequiring further investigation. Once a case is created, it can beassigned to an operator, investigated, and resolved based on thebusiness policies and practices of the enterprise (e.g., as documentedin knowledge base 28). The security staff can also add narration andevent information to a case, or view open cases to determine theirstatus and any required next steps.

Consoles 16 also provide a front-end for the administration of theentire system 10. This may include system configuration such as settingup operators, notification, agent behavior, etc. User management (suchas creating and modifying users, access, roles, and responsibilities),rules management (e.g., authoring, viewing, and updating rules), andworkflow management (e.g., setting up the flow of actions taken when anevent is received) may also be handled through the consoles 16. Finally,the consoles 16 allow for remote access, thus supporting divisionalresponsibility and “follow-the-sun” management.

Having thus described the elements of system 10, it is helpful topresent a more in depth look at rules engine 18. As described above,rules engine 18 is used to cross-correlate the event data with securityrules in order to generate meta-events. FIG. 4 illustrates oneembodiment of a rules engine 18 configured in accordance with thepresent invention. Rules engine 18 includes six components; a partialmatcher 410, a memory manager 420, a RETE engine 430, a time tracker440, a rules manager 490, and an action engine 450.

The partial matcher 410, receives events via the agent manager 26. Theevents may be gathered from different security devices. Partial matcher410 determines which rule in the system is interested in a particularevent it received. An event is considered to be interesting if a ruleused by the system mentions one or more attributes of the event. Forexample, an event can be considered interesting if the event has aparticular source address from a particular subnet. Partial matcher 410groups, batches, or aggregates interesting events together that arerelated to one another and one or more conditions of a particular rule.Additionally, the partial matcher 410 is aware of time windowsassociated with a rule used by the system. By knowing the time windowassociated with a rule, partial matcher 410 can compute the last momentin time at which that rule remains interesting, and should be processedby RETE engine 430.

The memory manager 420 keeps track of the events that are operated on byRETE engine 430. Aggregated events are passed from the partial matcher410 to the memory manager 420. The aggregated events can have anexpiration time, (i.e., the aggregated event is only of interest for aperiod of time.) Memory manager 420 provides events to RETE Engine 430and deletes the events when the event reaches its expiration time. Inother words, memory manager 420 feeds and deletes events to and from theRETE engine 430 as needed to provide statefulness.

RETE engine 430 implements the RETE algorithm which scales to manyhundreds of rules while its performance is independent of the number ofrules it considers. In operation, RETE engine 430 loads user-writtenrules, that are time-based and once the rule is active, engine 430analyzes the events provided by memory manager 420 and generates a firststage meta-event which can result in the performance of an action inresponse to the correlated events. More specifically, the RETE engine430, reports instances where the rules are satisfied.

Rules manager 490 provides user-defined, time-based rules to the RETEengine 430. The user-defined rules are generated via console/browserinterfaces 16. Furthermore, a user can provide instructions to rulesmanager 490 for activating or deactivating rules dynamically.

Time tracker 440 allows the RETE engine 430 to process time-based rules.Time-based rules are triggered when events that occur over a period oftime collectively are recognized as being associated with similaroccurrences. Time tracker 440 receives meta-events generated by RETEengine 430 and groups related meta-events together. This correlation ofthe meta-events is used to determine if the threshold of a time-basedrule is reached. For example, time-based rules can require that an eventoccur ten times in an hour to signify that an action need be performed.If that threshold is not reached, the group of events is terminated. Ifthe threshold is reached, then a second stage meta-event is generated.Time tracker 440 communicates with action engine 450, which executes theactions specified in the rule loaded into RETE engine 430. Meta-eventsare a hypothesized description of the real world scenario of whatvarious sensors of security devices independently report as events.These meta-events may then be fed back into rules engine 18 to be usedas an event. Action engine 450 can notify the user that a meta-eventoccurred via email, website, or with notifier on the console.Additionally, time tracker 440 reports a third stage meta-event if nosecond stage meta-events occur or when second stage meta-events cease tooccur. The third stage meta-event detects the end of a security attackand indicates the magnitude of the attack.

Although the functional blocks of a rules engine are depicted in oneembodiment within rules engine 18, one or more of these functionalblocks can be distributed in other systems. Rules engine 18 hasadditional functionalities, such as detection of improper rule syntax,loop detection, rule feedback detection, aggregation of joint events,and timeline alignment. Detection of improper rule syntax allows for thedeactivation of rules that are abusive to the system, such asinefficient memory consumption, or CPU usage. Rules engine 18 can alsodetect if a user defined rule generates a loop condition in which thesame events are provided repeatedly to the RETE engine 430. Similarly,rules engine 18 can detect rule feedback. Rule feedback occurs when ameta-event that is generated by rules engine 18 is fed back into rulesengine 18 and results in an abusive or destructive consumption of memoryor CPU processing.

FIG. 5 illustrates procedures followed by a rules engine 18 inaccordance with an embodiment of the present invention. Partial matcher410 receives an event (step 510) and determines if the event isinteresting as described above (decision block 530). If the event is notof interest, the next event is received by partial matcher 410 (step520) and the process determines if the next event is interesting(decision block 530). If an event is determined to be of interest, theseinteresting events are aggregated by partial matcher 410 (step 530). Inparallel with the processing performed by partial matcher 410, rulesmanager 490 compiles a user-defined, time-based rule (step 545)associated with the aggregated events of interest. The compiled rule andaggregated events are provided to RETE engine 430 (step 550) for a timeperiod specified by the rule. If the time period expires, the time-basedrule is no longer provided to the RETE engine 430.

Upon receipt of the compiled rule and interesting events, the RETEengine 430 determines if one or more of the aggregated events matches aprocessed rule (decision block 560). If no matches occur, then newevents are received at rules engine 18 (step 520), and the processdescribed above is repeated. If a match occurs, then a first stagemeta-event is generated (step 570). Although not shown, first stagemeta-events can be reported, via the console 16 or with anotherreporting mechanism described above.

First stage meta-events are aggregated and processed by time tracker 440when determining if in subsequent time periods, whether a thresholdlevel of repeat matches of similar interesting events occur (decisionblock 580). For example, if five first stage meta-events occurred in afirst ten minute time period, time tracker 440 determines if five morefirst stage meta-events occur in a subsequent ten minute time period. Ifthe threshold is met (i.e., five meta-events occur in ten minutes), thena second stage meta-event is generated (step 590). After generating asecond stage meta-event, the process described above repeats, byaccepting events at a rules engine 18 (step 520). Additional secondstage meta-events can occur, however, if no second stage meta-eventsoccur or stop occurring in subsequent time periods, a third stagemeta-event is generated (step 599). The generation of a third stagemeta-event signifies that a security attack on the system has ended. Italso measures how long an attack was, and the attack's magnitude interms of the number of network computers attacked. Both second and thirdstage meta-events can be reported as actions performed by action engine450. By reporting second stage meta-events, the detection of repetitivebehavior attacks and enterprise wide attacks is improved. Process 500can occur within the process shown in FIG. 3, above. Specifically,process 500 describes how event data is processed through rules engine18, as shown at step 42 of FIG. 3.

Thus, a computer-based system for capturing correlating and reportingsecurity events from heterogeneous sources, including a correlationengine with support for time-based rules has been described. In theforegoing description, the various examples and embodiments were meantto be illustrative of the present invention and not restrictive in termsof their scope. Accordingly, the invention should be measured only interms of the claims, which follow.

1-41. (canceled)
 42. A computer-implemented method, comprising: generating first stage meta-events based on analyzing time attributes of base events received from at least one network component; generating second stage meta-events based on a number of the first stage meta-events that have a time attribute falling within a time period; determining an amount of time that has passed since a most-recent second stage meta-event was generated; and if a threshold time period does not exceed the amount of time that has passed since the most-recent second stage meta-event was detected, generating a third stage meta-event.
 43. The method of claim 42, wherein generating first stage meta-events comprises: identifying a first rule that indicates a threshold number of base events and a first time period; determining how many base events include a time attribute that falls within the first time period; and when the threshold number of base events does not exceed the number of base events whose time attributes fall within the first time period, generating a first stage meta-event.
 44. The method of claim 43, comprising: activating the first rule dynamically.
 45. The method of claim 43, comprising: detecting improper rule syntax.
 46. The method of claim 43, comprising: detecting a loop condition generated by the first rule.
 47. The method of claim 43, comprising: detecting rule feedback.
 48. The method of claim 43, comprising: filtering the base events based on a condition before determining how many base events include a time attribute that falls within the first time period.
 49. The method of claim 48, wherein filtering the base events based on the condition comprises discarding the base events that do not satisfy the condition.
 50. The method of claim 42, wherein the at least one network component comprises an intrusion detection system.
 51. The method of claim 42, comprising: aligning time of the base events generated by different network components.
 52. The method of claim 42, comprising: performing an action specified by the first rule to notify an individual of the first-stage meta-events.
 53. A system, comprising: data storage to store base events received from at least one network component; and a processor to generate first stage meta-events based on analyzing time attributes of base events received from at least one network component, generating second stage meta-events based on a number of the first stage meta-events that have a time attribute falling within a time period, determine an amount of time that has passed since a most-recent second stage meta-event was generated, and if a threshold time period does not exceed the amount of time that has passed since the most-recent second stage meta-event was detected, generate a third stage meta-event.
 54. The system of claim 53, wherein the processor to generate first stage meta-events comprises the processor to identify a first rule that indicates a threshold number of base events and a first time period, determine how many base events include a time attribute that falls within the first time period, and when the threshold number of base events does not exceed the number of base events whose time attributes fall within the first time period, generate a first stage meta-event.
 55. The system of claim 54, wherein the processor is to activate the first rule dynamically.
 56. The system of claim 54, wherein the processor is to detect improper rule syntax.
 57. The system of claim 54, wherein the processor is to detect a loop condition generated by the first rule.
 58. The system of claim 54, wherein the processor is to filter the base events based on a condition before determining how many base events include a time attribute that falls within the first time period.
 59. The system of claim 58, wherein the processor is to discard the base events that do not satisfy the condition.
 60. A non-transitory computer readable medium storing machine readable instructions that when executed perform instructions to: generate first stage meta-events based on analyzing time attributes of base events received from at least one network component; generate second stage meta-events based on a number of the first stage meta-events that have a time attribute falling within a time period; determine an amount of time that has passed since a most-recent second stage meta-event was generated; and if a threshold time period does not exceed the amount of time that has passed since the most-recent second stage meta-event was detected, generate a third stage meta-event. 